Journal of Medical Internet Research
◐ JMIR Publications Inc.
Preprints posted in the last 30 days, ranked by how well they match Journal of Medical Internet Research's content profile, based on 85 papers previously published here. The average preprint has a 0.20% match score for this journal, so anything above that is already an above-average fit.
Yin, S.; Xin, W.; Chen, S.; Ge, Y.
Show abstract
Social media has become a critical channel for public health communication during the COVID-19 pandemic, yet how official health messaging aligns with broader public discourse remains insufficiently understood. This study develops an end-to-end info-veillance framework to examine the dynamic relationship between Centers for Disease Control and Prevention (CDC) communications and general public discourse on social media. We analyzed 17,524 CDC tweets and 67,895 public discourse tweets. Biterm Topic Model (BTM) was used to extract topics from each corpus, and a novel topic consistency scoring system integrating cosine similarity with daily public topic prominence was developed to quantify temporal alignment between official health communication and public discourse. Two complementary sentiment measures were incorporated: expected sentiment (average emotional tone) and net sentiment (overall emotional intensity). Temporal relationships were examined using autoregressive integrated moving average with exogenous variables (ARIMAX) models. Results show that topic alignment increased over time across CDC topics, while expected sentiment remained consistently negative. Higher alignment was associated with immediate and delayed changes in expected sentiment and stronger emotional intensity in net sentiment based on ARIMAX results. These findings suggest that topic alignment reflects public attention rather than agreement with official communications, and is associated with more negative emotional responses. This framework provides a scalable, generalizable approach to investigate and evaluate public engagement with official health communication.
Khan, M. M.; Anwar, M. N.
Show abstract
Background: Large language models (LLMs) are increasingly used in telehealth, but their safety in antibiotic prescribing remains uncertain, particularly in the presence of patient misinformation. Methods: A cross-sectional analytical study evaluated 5,000 responses from five chatbot models using 1,000 primary-care vignettes of mild infections. Guideline adherence, overprescribing, misinformation effects, and safety behaviors were assessed. Inappropriate prescriptions were classified using the WHO AWaRe framework. Results: Overall, 76.2% of responses were guideline-concordant, while 6.6% showed unprompted overprescribing and 17.2% were influenced by misinformation. Some models were more vulnerable to misinformation than others. Although most responses correctly noted that antibiotics do not treat viral infections, fewer advised consulting a doctor, and warnings against self-medication were rare. Many inappropriate prescriptions involved broad-spectrum antibiotics. Conclusion: LLMs show potential in telehealth but remain prone to misinformation and inappropriate prescribing. Stronger guideline integration and clinical oversight are necessary to ensure safe use. Keywords: antimicrobial stewardship; large language models; telehealth; antibiotic prescribing; misinformation; clinical safety
Losos, W.; Wang, B.; Fisher, K.; O'Connor, L.; Soni, A.; Gerber, B.
Show abstract
Background Home Test-to-Treat (HTTT) programs deliver timely antiviral treatment for acute respiratory infections, including COVID-19 and influenza, through at-home testing and telehealth. Because access is often measured by visit occurrence, variation in how and when care is delivered may be overlooked. We hypothesized that telehealth access follows distinct process-based patterns. Methods We analyzed de-identified encounters from the national HTTT program (September 2023-July 2024); 6,213 of 8,160 eligible individuals remained after exclusions for missing data. Phenotypes were derived by k-means clustering of standardized variables capturing encounter timing, modality preference, process duration, and sociodemographic and digital access attributes. Ten-day surveys assessed symptom duration and healthcare utilization. Results Three phenotypes emerged: Delayed/Disrupted Access (n = 1,537; 24.7%), Digitally Engaged but Socioeconomically Vulnerable (n = 1,460; 23.5%), and Mainstream Access and Efficient Utilization (n = 3,216; 51.8%). Mean process duration differed (15.93 [SD 3.84] vs 3.69 [3.31] vs 2.87 [2.41] hours; p < 0.001). Synchronous preference was lowest in the Digitally Engaged group (22.9%); antiviral prescribing was high (88.6%-91.9%). Among 10-day respondents (n = 1,023), symptom duration did not differ. Emergency department visits were most frequent in the Digitally Engaged group (2.3% vs 0.0% and 0.5%; p = 0.02) and urgent care in the Delayed/Disrupted group (5.8% vs 4.1% vs 2.0%; p = 0.02). Conclusions Telehealth use in a national HTTT program formed distinct phenotypes defined by timing, modality, and care-process efficiency. Evaluating equity requires attention to how and when care is delivered, not simply whether it occurred.
Gharibyan, I.; Ahner, E.; Shao, R.; Sharma, D.; Navarsartian Tazehkand, T.; Diep, J.; Assoumou, B.
Show abstract
Background: Statins are key to preventing atherosclerotic cardiovascular disease and lowering low-density lipoprotein cholesterol and cardiovascular events. However, skepticism regarding their safety and value persists and is increasingly influenced by social media. TikTok has emerged as a major source of health information, but its content varies in quality and accuracy. This study evaluated the quality, attitudes, misinformation, and engagement of statin-related content on TikTok. Methods: Public TikTok videos were collected using predefined search terms and coded by creator type, thematic content, and overall attitude. Video quality was assessed using the DISCERN instrument, the Patient Education Materials Assessment Tool for Audiovisual Materials, and the Global Quality Score. False or misleading claims were independently reviewed by two cardiology fellows. Associations between engagement and quality were also examined. Results: Of 1,349 screened videos, 258 met inclusion criteria. Most were educational (91.0%), with non-physician healthcare providers (34.5%) as the largest creator group. Risks or negative effects were discussed more often than benefits (63.2% vs 42.2%), and 39.5% contained at least one false or misleading claim, most often from complementary and alternative medicine providers and wellness promoters. Quality differed by creator type across all instruments, with physician-created content scoring highest. Video popularity showed minimal association with informational quality. Conclusion: Statin-related TikTok content frequently emphasizes harms, often contains misinformation, and varies substantially in quality by creator type. Greater involvement of healthcare professionals on social media may help improve digital health literacy and counter misleading information about statin therapy.
Küüsvek, M.; Hallik, R.; Pajusalu, M.; Kuura, A.
Show abstract
Background: Mental health issues are prevalent among men, yet help-seeking remains low due to stigma, masculinity norms and access barriers. Digital mental health (DMH) screening questionnaires offer opportunities for early detection, but their uptake among men is limited. Objective: This study explored the barriers and facilitators influencing mens willingness to use DMH screening questionnaires, with the aim of informing user-centered design that supports early detection and engagement. Methods: This interpretive qualitative study was conducted through semi-structured interviews with 17 purposively sampled Estonian men (aged 20-54) in a highly digitalized context until data saturation was reached. Thematic analysis followed a mixed deductive-inductive approach: deductive codes were derived from theoretical frameworks (Technology Acceptance Model, Health Belief Model, User-Centered Design, Behavioral Design), while inductive themes emerged from participants responses across the three research questions, including their evaluations of four screening questionnaire (PHQ-2, PHQ-9, EEK-2, WHO-5). Results: Key barriers included data privacy fears, distrust of digital solutions, lengthy questionnaires, and poor user experience (UX). Facilitators were anonymity, institutional trust, short (5-10 min) questionnaires, mobile-optimized design, personalized feedback, and clear next steps. As main contribution, four archetypes were identified: Skeptic, Self-Manager, Explorer, and Situational Seeker. They reflected distinct patterns across privacy concerns, institutional trust, user experience preferences, and help-seeking orientations. Skeptics were characterized by low institutional trust, high concern about data misuse, and a preference for anonymous, low-friction interactions, often delaying help-seeking. In contrast, Self-Managers emphasized autonomy, transparency, and evidence-based support, engaging in structured self-monitoring and purposeful help-seeking. Explorers showed openness to experimentation and engagement, particularly when supported by intuitive, interactive, and visually clear UX, while data sharing depended on perceived value. Situational Seekers demonstrated episodic engagement patterns, where trust, data-sharing, and help-seeking were highly context-dependent, preferring fast, low-effort interactions when needed. Conclusions: Mens uptake of DMH screening questionnaires is influenced by a combination of social, psychological, and usability factors. Effective design should integrate anonymity, institutional credibility, and user-centered features to support engagement and early mental health detection. Personalized, actionable feedback with transparency, user control, and clear next-step guidance emerged as key drivers of sustained engagement, while poor usability and lack of meaningful feedback led to disengagement. Importantly, the proposed archetypes capture how these factors co-occur in dynamic, context-dependent user profiles, offering a more actionable alternative to one-size-fits-all and demographic approaches for designing DMH questionnaires tailored to male users.
Baroud, S.
Show abstract
Migraine detection and sentiment analysis in healthcare have become increasingly important, particularly with the rise of social media platforms like Twitter, where users often share their personal health experiences. This study presents MASHA (Multi-Agent System for Healthcare Sentiment Analysis), an artificial intelligence (AI)-driven framework that integrates multiple machine learning (ML) models for sentiment analysis of Arabic tweets related to migraines. The system leverages a multi-agent architecture to handle tasks such as data acquisition, pre-processing, model training and real-time decision-making. Key ML models, including Support Vector Machines (SVM), Naive Bayes (NB) and Logistic Regression (LR), are integrated using ensemble techniques, leading to improved classification performance. Experiments conducted on a dataset of Arabic tweets demonstrate that MASHA outperforms traditional methods, achieving an accuracy of 90.0% and an F1-score of 89.46%. Moreover, the system's scalability and flexibility make it suitable for real-time public health monitoring, offering valuable insights into patient experiences and public sentiment regarding healthcare services. MASHA's adaptability suggests its potential application for analysing other healthcare-related conditions, reinforcing the system's scalability and broader relevance. Future work will focus on incorporating deep learning (DL) models and expanding the dataset with content from additional social media platform.
Giblett, M. J.; Babikian, Y.; Jhala, D. J.; Medland, S. E.
Show abstract
Pharmacogenomics (PGx) offers a pathway towards personalised medicine, which relies on health consumer involvement in making informed decisions. As consumers increasingly seek health information online, high-quality digital resources are essential to support informed consent and shared decision making. The complexity of PGx and widespread limitations in health literacy raise concerns about whether existing consumer-facing online PGx resources are understandable and sufficiently comprehensive. This study evaluates the readability, visual design, and informational quality of publicly available online written PGx health information. Twenty-three webpages met inclusion criteria. The mean readability corresponded to approximately 15 years of formal education (university level), substantially exceeding the Australian Government's recommended Year 7 reading level for public health materials. Informational quality was generally low, with most webpages being rated as poor or very poor. In contrast, visual design quality was relatively strong, with webpages achieving on average around three-quarters of the criteria. Although the visual presentation of PGx webpages is generally professional, their high reading difficulty and limited discussion of treatment choices and uncertainties reduce their usefulness for health consumer education. Improving readability, clearly communicating risks and limitations, and incorporating decision-support features may enhance the ability of online resources to support informed consent and shared decision making.
Jiang, S. Y.; Roche, T. R.; Cybulski, K.; Dugac, G.; Meier, L.; Tangel, V. E.; Ebensperger, M.; Maskos, A.; Tucci, M.; Noethiger, C. B.; Kalisch, M.; Turnbull, Z. A.; Tscholl, D. W.
Show abstract
Perioperative patient monitoring requires clinicians to integrate multiple physiological data streams under time pressure and frequent interruptions. Conventional monitors predominantly present vital signs as separate numerical values and waveforms, which must be sequentially interpreted and mentally integrated, imposing substantial cognitive demands. Audible alarms are intended to enhance safety but contribute to alarm fatigue and increased workload. Time spent outside predefined safe ranges for key physiological variables and excessive alarm burden are associated with adverse outcomes, motivating approaches that support earlier detection and improved situation awareness without increasing cognitive load. The Philips Visual Patient Avatar is an avatar-based visualisation technology displayed on the patient monitor that supports clinicians' situation awareness by integrating multiple vital signs and sensor states into a single animated virtual patient, while retaining conventional numerical displays. Although laboratory, simulation and qualitative studies suggest benefits of avatar-based monitoring, its impact on objective monitoring outcomes has not been systematically quantified.
Rim, J.; Xu, Q.; Tang, X.; Pinkerton, C.; Guo, Y.; Qu, A.
Show abstract
Background Wearable-based studies have largely examined activity and sleep using static summaries or single time windows, potentially missing how chronic patterns and recent behavioral changes jointly relate to depressive symptom severity. We evaluated whether combining long-term habitual behavior with short-term dynamics improves characterization of moderate-to-severe depressive symptoms. Methods We analyzed Fitbit data from All of Us participants with Patient Health Questionnaire-9 (PHQ-9) assessments, defining moderate-to-severe symptoms as PHQ-9 [≥] 10 (N=248). Logistic regression evaluated long-term measures (past-year step count and awake time after sleep onset) and short-term dynamics (30-day step decline and 30-day sleep duration variability), adjusting for demographics. Performance was assessed via repeated stratified 10-fold cross-validation. Results Thirty percent of participants (n = 74) had moderate-to-severe depressive symptoms. Higher long-term step count was associated with lower odds of elevated symptoms (OR = 0.75 per 1,000 steps/day), greater awake time after sleep onset with higher odds (OR = 1.27 per 1%), a 30-day step decline with higher odds (OR = 2.70), and greater 30-day sleep variability with higher odds (OR = 1.07 per percentage point). Short-term dynamics provided complementary information beyond long-term measures alone. The combined model achieved the highest discrimination (area under the curve [AUC] = 0.80 vs. 0.73 demographics-only), though findings should be interpreted as exploratory given the modest sample size. Limitations The sample was modest in size (N = 248), PHQ-9 reflects symptom severity rather than clinical diagnosis, causal inference is not possible given the cross-sectional outcome assessment, and Fitbit users may not represent broader populations. Conclusions Long-term behavioral patterns and short-term changes in activity and sleep were associated with depressive symptom severity, supporting wearable-derived measures as potential adjunctive markers in mental health research.
Hu, D.; Flores, D.; Flores, L.; Chien, R.; Lam, K.; Chow, E.; Guo, Y.; Tam, S.; Perret, D.; Pandita, D.; Zheng, K.
Show abstract
Ambient AI documentation systems rely on automatic speech recognition to transcribe patient-provider conversations before generating clinical notes. However, little empirical evidence exists on how these systems perform in mixed-language clinical encounters. We conducted a mixed-method heuristic evaluation of an ambient AI documentation tool using 24 reenacted primary care conversations involving Spanish-English and Mandarin-English code-switching. Quantitative analyses measured mixed error rate (MER) and code-switching detection. Overall MER was low, with a median of 4% and less variation in Spanish-English conversations, and 9% in Mandarin-English conversations, but with outliers reaching 67%. The system generally detected language switches reliably, although deletions occurred frequently in Mandarin-English transcripts at switch points. Qualitative analysis revealed transcription errors related to phonetic similarity, automatic language translation, clinical terminology recognition, and language-specific challenges. These findings highlight considerations for improving ambient AI clinical documentation systems to support multilingual providers in delivering care for linguistically diverse populations.
Kurt, F.; Subasi, S. N.; Yakisan, E. S.; Subasi, A.
Show abstract
Background: Wearable technologies enable scalable and continuous monitoring of emotional states through passive sensing of physiological and behavioral signals. However, conventional learning approaches often struggle to model the complex temporal, contextual, and relational dependencies underlying human emotions. To address these limitations, we propose a graph-based framework that represents multimodal wearable observations as heterogeneous knowledge graphs enriched with semantic information derived from Large Language Models (LLMs), enabling richer contextual understanding beyond raw sensor measurements. Methods: We constructed a heterogeneous knowledge graph using multimodal Fitbit physiological signals and affective self-report data collected from 45 users. Framing mood prediction and emotion detection was formulated as both binary and ternary node classification tasks. We evaluated five baseline heterogeneous Graph Neural Network (GNN) architectures and compared them with the proposed Semantically Gated Augmented Graph Neural Network (SeGA-GNN) framework, which dynamically integrates LLM-generated semantic embeddings into graph representations through a gated cross-modal fusion mechanism. Results: The baseline GNN models achieved strong performance, with classification accuracies ranging from 0.7525 to 0.9739 for binary classification and 0.6249 to 0.9699 for ternary classification. The proposed SeGA framework consistently improved predictive performance across most architectures. In particular, semantic augmentation transformed the HAN model from moderate baseline performance into near-perfect emotion recognition capability, achieving SeGA-HAN Accuracy = 0.9988 and AUC = 1.0000 for binary classification and Accuracy = 0.9979 and AUC = 1.0000 for ternary classification. Discussion and Conclusion: Integrating LLM-derived semantic contextualization into heterogeneous graph learning enables effective modeling of contextual information that is not directly captured by wearable physiological signals alone. The proposed SeGA-GNN framework demonstrates that adaptive semantic fusion substantially improves the accuracy, robustness, and interpretability of wearable-based emotion detection. These findings establish a promising direction for next-generation wearable affective computing systems and intelligent emotion-aware applications.
Shah, A.; Mehta, A.; Bhensdadia, C. K.
Show abstract
Mental health challenges among university students have increased due to academic pressure, lifestyle changes, and continuous digital engagement. Existing approaches for mental health assessment often rely either on self-reported psychological scales or isolated behavioral indicators, limiting their ability to capture complex temporal and contextual patterns. This study proposes an interpretable multimodal framework for student mental health risk assessment using behavioral sensing, academic information, ecological momentary assessments (EMA), and psychometric survey data. A bidirectional Long Short-Term Memory autoencoder is employed to learn latent temporal representations from day-level behavioral sequences, while graph embeddings capture structural relationships among students using similarity-based neighborhood graphs. These representations are fused with academic and survey-derived features and reduced using Principal Component Analysis and Uniform Manifold Approximation and Projection. K-means clustering is then applied to identify behaviorally distinct student groups. Experimental analysis on the StudentLife dataset demonstrates meaningful clustering performance with a Silhouette Score of 0.4209 and Adjusted Rand Index stability of 0.6869. The identified clusters correspond to low-risk, moderate-risk, and high-risk behavioral profiles. To improve interpretability and practical usability, a fuzzy inference system is introduced to compute mental risk, academic risk, and wellbeing indices using psychometric indicators including PHQ-9, PSS, PANAS, VR-12, and Big Five personality traits. The results demonstrate the potential of combining multimodal behavioral modeling with interpretable fuzzy reasoning to support early mental health risk assessment in educational settings.
Gatto, J.; Yang, J.; Seegmiller, P.; Rahat, R.; Burdick, T.; Preum, S. M.
Show abstract
Patient portal messaging has become a primary channel for asynchronous clinical communication, it spans a wide range of content, from symptom reports and medication concerns to administrative requests. Despite this volume and diversity, there is no formal representation for what a portal message contains: no vocabulary for the clinical and administrative events it describes, or for the attributes of those events that the patient has actually disclosed. Without such a representation, it is difficult to systematically analyze portal communication, assess message completeness, or build downstream tools that depend on structured input, such as automated triage, response drafting, and follow-up question generation. A clinical event schema, grounded in real portal messages and reviewed by clinicians, would provide this missing foundation. We introduce a clinical event ontology for patient portal messages, containing 8 event types and 70 roles that span clinical content (symptoms, medications, diagnostic tests, treatment responses, patient history) and administrative content (medical needs, logistics, social factors). The ontology was developed iteratively in collaboration with clinical expert and human evaluation. As a downstream application, we use the ontology to characterize the event types and roles most frequently sought in clinician follow-up questions, which provides insight of what clinicians ask about when reading portal messages.
Heidenreich, B. M.
Show abstract
Background. Complex cases in specialized pediatric care require consistent adherence to evidence-based clinical pathways and protocols to ensure safe, high-quality, and equitable care. Currently, clinical pathways and supporting documentation are frequently distributed across multiple platforms, leading to fragmentation. Human-centered design principles can guide the development of healthcare technologies that minimize cognitive load and support rapid, efficient access to relevant information in clinical settings. The purpose of this study is to design and evaluate perceived usability of a pediatric cardiac center digital guideline management system that is embedded within the electronic health record leveraging human-centered design. Methods. This study used a mixed-methods usability evaluation to assess a digital guideline management system prototype embedded into clinical workflow. Through human-centered design principles, the prototype provides a centralized digital document library that organizes cardiac-specific clinical pathways, guidelines, procedures, and related resources. A small but diverse sample, encompassing a wide variety of roles and clinical areas within the pediatric cardiac center, was recruited to evaluate the perceived usability of the prototype. Usability was evaluated by stakeholders using the validated System Usability Scale (SUS) with additional optional questions to understand perceptions of the information architecture and clinical value. Results. Preliminary usability testing showed a mean SUS composite score of 76.5, indicating above average usability. Questions related to the complexity of the system and user confidence received high scores across participants. Lower scores were observed for questions related to usage frequency and ability to learn the system very quickly. Conclusion. Leveraging human-centered design when building a digital guideline management system embedded within clinical workflow revealed positive perception from participants. By centralizing access to clinical resources, this prototype can reduce current-state fragmentation. Further evaluation of larger samples is needed to develop a list of future recommendations.
Mangut, E.; Wallace, R.
Show abstract
Background: Professionalism and effective communication are foundational determinants of patient safety and quality of care. Unprofessional behaviors frequently serve as active precursors to adverse clinical events. However, proactive organizational surveillance is often hindered because incident feedback exists primarily as unstructured, free-text data. This study aimed to develop and validate a Natural Language Processing (NLP) pipeline and interactive dashboard to proactively monitor the "professionalism climate" within NYC Health + Hospitals, the largest municipal healthcare delivery system in the United States. Methods: A high-fidelity synthetic dataset (N=400) was computationally generated to safely mirror historical incident logs across 11 acute facilities without utilizing Protected Health Information (PHI). A rule-based NLP pipeline was developed in R utilizing the tidytext package. Unstructured narrative feedback was tokenized and classified into three core domains: Respect, Safety, and Communication. To validate the pipeline's accuracy, a 25% random stratified sample (n=100) was evaluated against independent, blinded manual coding performed by two reviewers, with inter-rater reliability measured via Cohen's Kappa. Finally, an interactive Tableau dashboard was developed to operationalize and visualize these metrics for ongoing surveillance. Results: The NLP algorithm achieved an overall accuracy of 85.8% (95% CI: 79.0-92.6), with 81.2% sensitivity and 88.9% specificity. The highest domain-specific performance was observed in Communication (88.0% accuracy). Manual validation demonstrated strong inter-rater reliability (k=0.84). Operational analysis via the dashboard revealed that 61.8% of reports occurred during the Tour 2 shift (15:00 to 23:00), aligning with peak operational volume. Furthermore, Respect-related feedback was reported at a disproportionately high frequency during the Tour 3 shift (23:00 to 07:00), accounting for over 50.7% of overnight feedback submissions. Conclusion: Rule-based NLP successfully transforms qualitative healthcare feedback into structured, actionable intelligence with high specificity. Integrating this pipeline into operational dashboards transitions safety culture surveillance from a reactive, manual exercise to a proactive, scalable system, enabling targeted, data-driven interventions by hospital leadership.
Bermejo-Pelaez, D.; Darias, O.; Pastor, L.; Valles, R.; Diez, N.; Lin, L.; Garcia-Villena, J.; Cuadrado, D.; Vladimirov, A.; Alamo, E.; Postigo, M.; Rodriguez-Dominguez, M.; Canton, R.; Rodriguez-Tudela, J. L.; Alastruey Izquierdo, A.; Bohorquez, L. C.; Rubio, J. M.; Dacal, E.; Luengo-Oroz, M.
Show abstract
Introduction. Lateral flow assays (LFAs) are indispensable rapid diagnostic tools in healthcare, enabling point-of-care diagnosis critical for patient management and support disease burden assessment and surveillance when results are properly recorded. However, misinterpretation errors and unreported cases remain a concern. A quality-assured, affordable Ai-powered tool, supporting the decision-making during result interpretation could promote proper disease monitoring and epidemiological surveillance. Here, we describe the performance of a universal AI model to digitize and interpret results from multiple LFA types through a smartphone application, a step that could ultimately enable standardized and digitally reportable test outcomes. Methods. The AI algorithm was evaluated in 17 LFA types, including both 2-band and 3-band tests for different diseases and manufacturers. The model was trained on a dataset of 22,576 images captured under diverse lighting conditions with different smartphone models and using a custom mobile application, TiraSpot (Spotlab, Madrid, Spain). To assess generalizability, a leave-one-out cross-validation was applied, where in each LFA type was iteratively excluded from training and used for testing. Model performance was evaluated using bootstrapping on the inference dataset. Results. In the assessment of the model's ability to generalize to new LFA types not previously analyzed (not included during development), the model achieved an overall AUC of 94.3% for second band detection. This overall performance was enhanced to 99.3% (Sensitivity=98,6%; Specificity=98%) after training with 50 images of each LFA type, highlighting the benefit of additional data for specific LFA types. For the third band detection, where less training data was available, the system achieved an overall AUC of 83.9% for unseen LFAs, improving to 94.2% (Sensitivity=92.9%; Specificity=87,9%) after training with 50 images of each LFA type. Conclusion. This system demonstrates the feasibility of an AI-powered universal digital reader for interpreting LFA results from diverse test types using smartphone-captured images. Its compatibility with standard smartphones makes it a universal tool, enabling reliable LFA interpretation across devices and settings. By standardizing test interpretation and digitizing results, this tool could support decision making in result interpretation, enhancing epidemiological surveillance, particularly in resource-limited settings. Its adaptability across various infections highlights its potential to improve diagnostic consistency and support disease management in diverse healthcare settings.
Tran, B. D.; Hu, D.; Kim, S.; Guo, Y.; Mangu, R.; Reynolds, T. L.; Lafata, J. E.; Tai-Seale, M.; Zheng, K.
Show abstract
Ambient clinical intelligence (ACI) systems use automatic speech recognition (ASR) to capture patient-provider conversations for downstream clinical documentation. However, many ASR evaluations are conducted under controlled conditions using specialized hardware. We evaluated how recording devices influence transcription performance of contemporary ASR engines applied to clinical dialogue. Thirty-five primary care encounters were re-enacted from transcribed conversations and recorded using five devices simultaneously: smartphone, laptop microphone, portable recorder, clip-on microphone, and a desktop microphone. Six ASR engines were evaluated using word error rate (WER), clinical concept extraction precision and recall, and sentence-level semantic similarity. Median WER ranged from 16.7% to 20.7% across engines. Engine choice produced larger variation in transcription performance than recording device, although device-related differences were statistically significant. Overall, contemporary ASR engines demonstrated relative robustness to consumer-grade recording hardware, suggesting that model selection may have greater impact on transcription performance than recording device configuration in real-world ACI deployments.
Bakumenko, A.; Smith, D. H.; Hoelscher, J.
Show abstract
Earlier ICU mortality prediction is more clinically useful because it can identify high-risk patients while treatment decisions can still change. Yet most models are trained on data from a fixed time window, so it is unclear whether a model trained on the first 48 hours of ICU data remains reliable when used earlier in the ICU stay. We evaluated a multimodal ICU mortality model trained once at 48 hours and then applied unchanged at 6, 12, 24, and 48 hours on MIMIC-III. The model combines an LSTM for physiological time-series data, a finetuned ClinicalModernBERT model for clinical notes, and a logistic regression fusion layer. Performance remained strong at earlier time points, suggesting that useful mortality prediction is possible earlier in the ICU stay even without retraining. At 6 hours, the model achieved AUROC 0.777 and remained well-calibrated (ECE 0.038) without any recalibration, and it outperformed both single-modality models at every horizon. The multimodal benefit was most evident at earlier horizons, when physiological data were sparse: agreement between the two specialists dropped by more than half from 48 to 6 hours, while the median contribution from clinical notes increased from 37% to 49%. A Bayesian version of the fusion layer showed that uncertainty decreased for survivors as more data accumulated but remained high for non-survivors; the most uncertain cases were up to 4.9 times more likely to be non-surviving patients. Continuous hourly analyses further showed that clinical notes provide stable context between documentation events. Simply carrying forward the most recent note matched or outperformed note-decay and documentation-gap alternatives. These results suggest that a multimodal ICU mortality model trained on 48 hours of data can provide trustworthy earlier predictions without retraining, while also identifying the cases that remain hardest to interpret.
Chou, N. A.; Baek, Y.; Feng, F.; Lu, K.; Choi, E. Y.; Fisher, H. M.; Malek, D.; Jammal, A.; Somers, T. J.; Muir, K. W.; Medeiros, F. A.; Berchuck, S. I.
Show abstract
Purpose: Psychological distress is highly prevalent in glaucoma and is associated with worse adherence, reduced quality of life, and faster disease progression. However, distress is rarely assessed in ophthalmology settings due to time, workflow, and staffing constraints. We evaluated two artificial intelligence (AI)-based screening strategies, designed to efficiently identify distressed primary open angle glaucoma (POAG) patients during routine care, aiming to achieve effective, resource conscious, low burden clinical screening. Design: Hybrid retrospective cohort and prospective cross-sectional study. Participants: The retrospective cohort included >3,000 POAG patients from the Duke Ophthalmic Registry. Prospective validation was conducted in a separate 300 POAG patient cohort who completed patient-reported distress screening. Methods: Using retrospective data, a neural network model was trained to predict an electronic health record (EHR)-derived computable phenotype of distress ("silver standard"). Prospective validation used the 8-item Patient Health Questionnaire (PHQ-8) as the "gold standard." Three screening strategies were compared against PHQ-8: (1) universal PHQ-2 screening (two-item screener administered to all patients), (2) AI-only screening (fully automated EHR-based screener), and (3) sequential screening, (only patients flagged as high risk by AI screener completed the PHQ-2). Performance metrics included sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), accuracy, and screening burden. Main Outcome Measures: Sensitivity; specificity; PPV; NPV; accuracy; proportion of patients requiring secondary screening (screening burden). Results: Distress prevalence was 17% (PHQ-8 > 6). Universal PHQ-2 screening (> 0) achieved high sensitivity (0.96) but lower specificity (0.73) and PPV (0.41), while requiring screening of all patients. The AI-assisted sequential approach substantially reduced screening burden while maintaining strong diagnostic performance. By administering PHQ-2 to ~25% of patients, sequential screening achieved sensitivity 0.64, specificity 0.93, PPV 0.64, NPV 0.93, and accuracy 0.88, representing a ~50% increase in PPV compared to PHQ-2 alone. AI-only screening reduced burden further but did not achieve comparable sensitivity or predictive performance. Conclusions: AI-assisted sequential screening enables scalable, resource efficient identification of psychological distress in glaucoma care, substantially reducing screening burden while preserving clinically meaningful performance. This framework offers a practical pathway for integrating distress screening into routine ophthalmology workflows and improving the identification and referral of at-risk patients.
Fisher, H. M.; Chou, N. A.; Falkovic, M.; Parnell, H.; Makarushka, C.; Fish, L. J.; Plumb Vilardaga, J.; Medeiros, F. A.; Somers, T. J.; Muir, K. W.; Berchuck, S. I.
Show abstract
Objective: To assess the feasibility and acceptability of VISON-ACT, a standalone, mobile app psychosocial intervention for psychological distress in individuals with primary open-angle glaucoma (POAG). Design: Single-arm pilot. Participants: Patients (N=28) with a diagnosis of POAG, self-reporting at least mild (>3) distress on the 4-item Patient Health Questionnaire, were recruited from the Duke Eye Center between April 2025-December 2025. Methods: Patients (n=28) were consented and completed a baseline (A1) self-report assessment. VISION-ACT was comprised of 6 weekly modules. Follow-up self-report assessments occurred at post- (A2) and 1-month post-intervention (A3) and included measures of psychological distress, vision and health-related quality of life, psychological flexibility, disease acceptance, self-efficacy for symptom management, mindfulness, and social support. Participants were invited to complete an exit interview at 1-month post-intervention to gather qualitative feedback on the VISION-ACT protocol. Descriptive statistics were used to assess feasibility and acceptability metrics and patterns of pre-post change on patient reported outcomes were explored with linear mixed mdels using R Statistical Software. Main Outcome Measures: Feasibility (target accrual (n=25) in 12 months, <20% attrition at post-intervention); Acceptability (>75% reporting use of VISION-ACT skills or ideas at post-intervention, >80% reporting M>3.00/4.00 at post-intervention on the Client Satisfaction Questionnaire); Psychological Distress (Hospital Anxiety and Depression Scale [HADS], Subjective Units of Distress Scale [SUDS]). Results: VISION-ACT was highly feasible; accrual target was surpassed (N=28) in 6 months, and attrition was low (3.85%) at post-intervention (A2). Acceptability was strong with 100% of participants reporting use of VISION-ACT skills or ideas at A2 and M=3.27/4.00 intervention satisfaction. Adherence was remarkable with 88.5% of participants completing all six VISION-ACT modules. Pre-post change patterns were in the expected direction for psychological distress (HADS A1 M=13.88, A2 M=11.21; SUDS A1 M=35.54, A2 M=26.46) and all other patient-reported outcomes across baseline, post- and 1-month post-intervention assessments. Data on participant perspectives highlighted valuable aspects of VISION-ACT, and areas for refinement. Conclusions: Robust feasibility and acceptability data seen here provide support a fully-powered, randomized trial to evaluate the efficacy of VISION-ACT for reducing psychological distress and improving related patient-reported and clinical outcomes.